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Abstract 

We explore clustering problems in the streaming sliding window model in both general metric spaces 
and Euclidean space. We present the first polylogarithmic space 0(1 )-approximation to the metric k- 
median and metric k-means problems in the sliding window model, answering the main open problem 
posed by Babcock, Datar, Motwani and O’Callaghan IS), which has remained unanswered for over a 
decade. Our algorithm uses 0(k^ log® n] space and poly(k, logn] update time. This is an exponential 
improvement on the space required by the technique due to Babcock, et al. We introduce a data structure 
that extends smooth histograms as introduced by Braverman and Ostrovsky to operate on a broader 
class of functions. In particular, we show that using only polylogarithmic space we can maintain a 
summary of the current window from which we can construct an 0(1 ]-approximate clustering solution. 

Merge-and-reduce is a generic method in computational geometry for adapting offline algorithms 
to the insertion-only streaming model. Several well-known coreset constructions are maintainable in 
the insertion-only streaming model using this method, including well-known coreset techniques for the 
k-median, k-means in both low-and high-dimensional Euclidean spaces USE El. Previous work has 
adapted these techniques to the insertion-deletion model, but translating them to the sliding window 
model has remained a challenge. We give the first algorithm that, given an insertion-only streaming 
coreset construction of space s, maintains a (1 ± e)-approximate coreset in the sliding window model 
using 0(s^e^^ logrt] space. 

Eor clustering problems, our results constitute the first significant step towards resolving problem 
number 20 from the List of Open Problems in Sublinear Algorithms llJTl . 
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1 Introduction 

Over the past two decades, the streaming model of computation |[^ has emerged as a popular framework 
in which to develop algorithms for large data sets. In the streaming model, we are restricted to using space 
sublinear in the size of the input, and this input must typically must be processed in a single pass. While 
the streaming model is broadly useful, it is inadequate for domains in which data is time-sensitive such as 
network monitoring ifldlfTSlfTTl and event detection in social media 1(361 . In these domains, elements of the 
stream appearing more recently are in some sense more relevant to the computation being performed. The 
sliding window model was developed to capture this situation |[20l . In this model, the goal is to maintain a 
computation on only the most recent W elements of the stream, rather than on the stream in its entirety. 

We consider the problem of clustering in the sliding window model. Algorithms have been developed 
for a number of streaming clustering problems, including k-median |[26l [121 l29l l25l . k-means llT3l l23l 
and facility location IT9l . However, while the sliding window model has received renewed attention re¬ 
cently llT8l l6ll. no major clustering results in this model have been published since Babcock, Datar, Motwani 
and O’Callaghan fS] presented a solution to the k-median problem. Polylogarithmic space k-median algo¬ 
rithms exist in the insertion-only streaming model |[T2ll29l and the insertion-deletion model ll30ll25ll3TI . but 
no analogous result has appeared to date for the sliding window model. Indeed, the following question by 
Babcock, et al. has remained open for more than a decade: 

Whether it is possible to maintain approximately optimal medians in polylogarithmic space (as 

Charikar et al. m do in the stream model without sliding windows), rather than polynomial 

space, is an open problem. 

Much progress on streaming clustering problems in Euclidean space has been due to coresets ll29l [T3l l25l 
|2Tl|24l. But, similarly to the metric case, methods for maintaining coresets on sliding windows for Euclidean 
clustering problems have been hard to come by. Streaming insertion-only coreset techniques exist for the 
Euclidean k-median and k-means problems in low-and high-dimensional spaces, but to our knowledge no 
such results exist for the sliding window model. We present the first such technique, a general framework in 
which one can build coresets for a broad class of clustering problems in Euclidean space. 

1.1 Our Contribution 

Metric clustering problems in the sliding window model. We present the first polylogarithmic space 
0(1 )-approximation for the metric k-median problem in the sliding window model, answering the ques¬ 
tion posed by Babcock et al. ||5]. Our algorithm uses 0(k^ log® n)-space and requires update time 
O(poly(k,log tl)), with the exact update time depending on which of the many existing offline 0(1)- 
approximations for k-median one chooses. We also demonstrate how this result extends to a broad class of 
related clustering problems, including k-means. The one requirement of our result is a polynomial bounded 
on the ratio of the optimal cost to the minimum inter-point distance on any window of the stream. 

Braverman and Ostrovsky ifTTI introduced smooth histograms as a method for adapting insertion-only 
streaming algorithms to the sliding windows model for a certain class of functions, which Braverman and 
Ostrovsky call smooth. Unfortunately, the k-median and k-means costs are not smooth functions, so smooth 
histograms cannot be directly applied. Our major technical contribution lies in the extension of smooth 
histograms ifTTI to a class of clustering functions, including k-median and k-means clustering, that are less 
well-behaved than smooth functions. We show that clustering problems k-median and k-means do possess a 
property similar to smoothness provided that a pair of conditions hold related to the cluster cardinalities and 
costs of clustering solutions built on certain subsets of the stream. We develop a streaming data structure 
that ensures that these two conditions are satisfied where necessary so that the core ideas behind the smooth 
histogram data structure can be brought to bear. Using the algorithms of |[T2l and IfTTI . we show that the 
bookkeeping necessary for our approach can be maintained using in polylogarithmic space and time. 
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Euclidean clustering problems in the sliding window model. Merge-and-reduce is a generic method in 
computational geometry to implement offline algorithms in the insertion-only streaming model. Several 
well-known coreset constructions are conducive to this method, including well-known coreset techniques 
for the k-median and k-means problems in both low-and high-dimensional Euclidean spaces |[29llT^ . We 
develop a sliding window algorithm that, given one of these insertion-only streaming coresets of size s, 
maintains this coreset in the sliding window model using 0(s^e^^ logn] space. 

To develop our generic framework, we consider a sequence X of indices of arrival times of points as in 
the smooth histograms ||8l. For each index xt G X we maintain a coreset (using merge-and-reduce) of points 
whose arrival times are between and current time N. In particular, as the points in the interval [x^, N] 
arrive, we compute coresets for small subsets of points. Once the number of these coresets is big enough, we 
merge these coresets and reduce them by computing a coreset on their union. This yields a tree whose root 
is a coreset of its leaves, which contain subsets of the points in the interval [x^, N]. The well-known coreset 
techniques of ll2^[T3ll^l24ll mostly partition the space into small set of regions and from each region take 
a small number of points either randomly or deterministically. Hence, at the root of the merge-and-reduce 
tree, we have a partition from whose regions we take weighted coreset points. 

The crux of the smooth histograms f81 data structure is to maintain a small set of indices X such that 
for every two consecutive indices xt and xt+i all intervals [xt -|- 1, N],..., [xt+i — 1, N] have clustering 
cost which are within (1 -|- e)-fraction of each other. By keeping only indices xi,xi+i, we smooth the cost 
between these two indices. To this end, we look at the partition of the root of the merge-and-reduce tree 
corresponding to arrival time t G [xi,Xi^i]. If there is a region in this partition with at most e-fraction of its 
points in interval [xt,Xi^.i], then we ignore the interval [t, N]; otherwise we add index t to X and keep the 
coreset of points for the interval [t, N]. We show using a novel application of VC-dimension and e-sample 
theory |[3i| that if we take small random samples from every region of the partitions in the intermediate nodes 
of the merge-and-reduce tree, then the coreset points inside every region of the partition of the root of this 
tree is a good approximation of the original points in that region. Thus, testing whether e-fraction of coreset 
points of a region are in the interval [xi, xi+i] is a good approximation for testing whether e-fraction of the 
original points of the region are in this interval. We also show that if for every region in the partition at 
most e-fraction of its points are in the interval [x^, x^+i], then ignoring the interval [t, N] and keeping only 
intervals [x^, N] and [x^+i, N] loses at most e-fraction of the clustering cost of points in interval [t, N]. 

Frahling and Sohler 1251 developed a coreset technique for k-median and k-means problems in low¬ 
dimensional spaces in the dynamic geometric stream (i.e., a stream of insertions and deletions of points) 
using the heavy hitters algorithm lfT6l and a logarithmic sampling rate. We observe that we can maintain 
their coreset in the sliding window model using the heavy hitter algorithm and sampling techniques proposed 
for the sliding window model due to Braverman, Ostrovsky and Zaniolo fQ]. However, their approach does 
not work for other well-known coreset techniques for Euclidean spaces |[29l[T^l^l2^ . motivating the need 
for a different technique, which we develop in this paper. 

1.2 Related Work 

Cuba, Mishra, Motwani and O’Callaghan 1261 presented the first insertion-only streaming algorithm for 
the k-median problem. They gave a 2*^^^/*^^-approximation using space, where e < 1. Charikar, 

O’Callaghan, and Panigrahy lfT2l . subsequently developed an 0(1 )-approximation insertion-only streaming 
algorithm using O(klog^n) space. Their approach operates in phases, similarly to 1261 . maintaining a set 
of O(klogn) candidate centers that are reduced to exactly k centers using an offline k-median algorithm 
after the entire stream has been observed. 

Slightly stronger results hold when the elements of the stream are points in d-dimensional Euclidean 
space Har-Peled and Mazumdar 1291 developed a (1 -|- e)-approximation for k-median and k-means in 
the insertion-only streaming model using (strong) coresets. Informally, a strong (k, e)-coreset for k-median 
is a weighted subset S from some larger set of points P that enables us to (approximately) evaluate the quality 
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of a candidate solution on P using small space. The coresets presented by Har-Peled and Mazumdar 1291 
required 0(ke^‘^ log n) space, yielding a streaming solution using 0(ke“‘^ log^'^^^ n) space via the famous 
merge-and-reduce approach (Tllll. Har-Peled and Kushal ll2^ later developed coresets of size 
for k-median and k-means problems. Unfortunately, these new coresets do not result in significant space 
improvements in the streaming model. Feldman, Fiat and Sharir ETl later extended this type of coreset to 
the case where centers can be lines or flats. 

In high-dimensional spaces, Chen ifTSl presented a technique for building (k, e)-coresets of size 
0(k^de^^ log^ n), yielding via merge-and-reduce a streaming algorithm requiring 0(k^de^^ log® n] space. 
Chen |[T3l also presented a technique for general metric spaces, which, with probability of success 1 — 5, 
produces a coreset of size 0(ke^^ logn(klogn -|- log(l/6])). 

To the best of our knowledge, there do not exist any results to date for the k-median problem on arbitrary 
metric spaces (often simply called the metric 'k-median problem) in the insertion-deletion streaming model. 
In the geometric case, introduced by Indyk ll^ under the name dynamic geometric data streams, Frahling 
and Sohler ll25l have shown (via a technique distinct from that in Il29ll28l ) that one can build a (k, e)-coreset 
for k-median or k-means using log^ n) or 0(ke“‘^^^ logn) space, respectively. 

In comparison to the insertion-only and dynamic geometric streaming models, little is known about the 
metric k-median problem in the sliding window model, where the goal is to maintain a solution on the most 
recent W elements of the data stream. To our knowledge, the only existing solution under this model is 
the 0(2°^T))-approximation given in [5!], where t G (0,1 /2) is a user-specified parameter. The solution 
presented therein requires O (log^ W) space and yields an initial solution using 2k centers, which is 
then pared down to k centers with no loss to the approximation factor. 

Outline. The remainder of the paper is organized as follows: Section|2] establishes notation and definitions. 
Our main results are presented in Section [3l which gives an algorithm for the k-median problem on sliding 
windows, and Section |4l which presents an algorithm for maintaining coresets for Euclidean clustering 
problems on sliding windows. Additional results and proof details are included in the Appendix. 

2 Preliminaries 

We begin by defining the clustering problems of interest and establishing notation. 

2.1 Metric and Geometric k-Median Problems 

Let (X, d) be a metric space where X is a set of points and d:XxX—>Misa distance function defined over 
the points of X. For a set Q C X, we let d(p, Q) = minqgQ d(p, q) denote the distance between a point 
p G X and set Q and we denote by p(Q) = minp^qgQ^p^q d(p, q) the minimum distance between distinct 
points in Q. We define [a] = {1,2,3, • • • a} and [a, b] = {a, a -|- 1, a -|- 2, • • • b} for natural numbers a < b. 
When there is no danger of confusion, we denote the set of points {pa, p^+i,..., Pbl C X by simply [a, b]. 
For example, for function f defined on sets of points, we denote f ({pa, Pa+i,..., Pb}) by simply f ( [a, b]). 

Definition 1 (Metric k-median) Let P be a set of n points in metric space (X, d) and let C = 
{ci ,..., Ck} ^ X be a set ofk points called centers. A clustering of point set P using C is a partition of P 
such that a point p G P /i' in partition P^ ifCi&C is the nearest center in C to p, with ties broken arbitrarily. 
Wb call each Pi a cluster. The k-median cost using centers C is C0ST(P, C) = ^pgp d(p, C). The metric 
k-median problem is to find a set C* C P ofk centers satisfying C0ST[P, C*) = niinQgp.|Q|^k C0ST(P, C). 
Wb let 0PT{P, k) = minggp.|g|^k COST^P, C] denote this optimal k-median cost for P. 

Definition 2 ((Euclidean) k-median Clustering) Let P be a set of n points in a d-dimensional Euclidean 
Space and kbe a natural number. In the k-median problem, the goal is to find a set C = - ■ ■ , Ck} C 

ofk centers, that minimizes the cost C0ST[P, C) = ^pgp d(p, C), where d(p, C] = minc^gc d(p, Ci) is 
the Euclidean distance between p and Ci. 
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Definition 3 ((k, e)-Coreset for k-Median Clustering) Let P be a set of n points in d-dimensional Eu¬ 
clidean Space and let \ibe a natural number. A set S C is a (k, e)-coreset for 'k-median clustering if 
for every C = {ci, • • • , Ck} C o/k centers we have |COSr(P, C) — C0ST[S, C)| < e • COSr(P, C). 

2.2 The Sliding Window Model 

Let (X, d) be a metric space and P C X a point set of size |P| = n. In the insertion-only streaming 
model ElliKia, we think of a (possibly adversarial) permutation pi,P2, •'' > Pn of P, presented as a data 
stream. We assume that we have some function f, defined on sets of points. The goal is then to compute the 
(approximate) value of f evaluated on the stream, using o(n) space. We say that point pN arrives at time N. 

The sliding window model EOl is a generalization of the insertion-only streaming model in which we 
seek to compute function f over only the most recent elements of the stream. Given a current time N, we 
consider a window W of size W consisting of points ps,ps+l,...,PN^ where s = max{ 1, N — W -|- 1}. We 
assume that W is such that we cannot store all of window W in memory. A point pi in the current window 
W is called active. At time N, point pi for which i < N — W -|- 1 is called expired. 

2.3 Smooth Functions and Smooth Histograms 

Definition 4 ((e, e' (-smooth function |8 |) Let P be a function defined on sets of points, and let e,e' G 
(0,1). PPe say f is an (e, e')-smooth function iff is non-negative (i.e., f(A] > 0 for all sets A), non¬ 
decreasing (i.e.,for A C B, f(A] < f(B)), andpolynomially bounded (i.e., there exists constant c > 0 such 
that f[A) = OdAI'^) ) and for all sets A, B,C 

f(B) > (1 — e)f(A U B) implies f(B U C) > (1 — e']f(A U B U C]. 

Interestingly, a broad class of functions fit this definition. For instance, sum, count, minimum, diameter, 
Lp-norms, frequency moments and the length of the longest subsequence are all smooth functions. 

Braverman and Ostrovsky [8i] proposed a data structure called smooth histograms to maintain smooth 
functions on sliding windows. 

Definition 5 (Smooth histogram [iBD Let 0<e<l,0<e'<l and a > 0, and let f be an (e, e']- 
smooth function. Suppose that there exists an insertion-only streaming algorithm A that computes an a- 
approximation f' off. The smooth histogram consists of an increasing set of indices Xn = {xi, X 2 , • • • , Xt = 
N} and t instances A ], Az, ■ ■ ■ , At o/A such that 

(1) Either px, is expired and Pxz active or X] = 0. 

(2) Eor 1 < I < t — 1 one of the following holds 

(a) Xi+i =Xi-hl a?i<if'([xi+i,N]) < (1 - e']f'([xi, N]), 

(b) f'([xi+i,N]( > (1 - e)f'([xt,N]) and iff G [t - 2], f'([xi+ 2 ,N]( < (1 - e')f'([xi, N](. 

(3) Ai = A([xi, N]] maintains f'([xi, N]]. 

Observe that the first two elements of sequence X^ always sandwich the current window W, in the sense 
that xi < N — W < X 2 . Braverman and Ostrovsky JSl used this observation to show that at any time 
N, one of either f'([xi, N]( or f'([x 2 , N]) is a good approximation to f'([N — W, N]), and is thus a good 
approximation to f([N — W, N](. In particular, they proved the following theorem. 

Theorem 6 (181) Let 0 < e, e' < 1 and a, (3 >0, and let f be an [e, e')-smooth function. Suppose 
there exists an insertion-only streaming algorithm A that calculates an a-approximation f' off using g(a) 
space and h(a) update time. Then there exists a sliding window algorithm that maintains a (1 ± (a -|- e)]- 
approximation off using 0(|3^^ • logn • (g(a) -|- logn)] space and 0(|3“^ • logn • h.(a)] update time. 

VC-Dimension and e-Sample. We briefly review the definition of VC-dimension and e-sample as pre¬ 
sented in Alon and Spencer’s book l|3l in Appendix iBl 
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3 Metric k-Median Clustering in Sliding Windows 

We introduce the first polylogarithmic-space 0(1 )-approximation for metric k-median clustering in the 
sliding window model. Our algorithm requires O(k^log^rL) space and O (poly (k, log n)) update time. We 
note that our algorithm is easily modified to accommodate k-means clustering. This modified algorithm will 
have the same time and space bounds with a larger approximation ratio that is nevertheless still 0 ( 1 ). 

3.1 Smoothness 

k-median and k-means are not smooth (see the Appendix for an example), so the techniques of |'8 ] do not 
apply directly, but Lemma|9]shows that k-median clustering does possess a property similar to smoothness. 

Definition 7 (A-approximate Triangle Inequality) Non-negative symmetric function d : X x X —> M>o 
satisfies the A-approximate triangle inequality //'d(a,c) < A (d(a,b) -|- d[\),c)) for every a,b,c E X. 

We note that a metric d satisfies the 1 -approximate triangle inequality by definition and that for any p > 1, 
dP obeys the -approximate triangle inequality, since (x -|-y )^ < 2 ^“^ (x^ -|-yP) for all non-negative x, y 
and p > 1. Thus, if p = 0(1), Theorem [T5] provides an 0(1 )-approximation for the clustering objective 
Y_ d ^(^5 C). The case p = 2 yields an 0(1 (-approximate solution for k-means. 

Definition 8 Let P and C = {ci,.. ., C]^} be sets of points from metric space (A", d). A map t : P —> C 

is called a clustering map of V for set C. If d(x, t(x)) < (3 • OPr(P,k), then we say that t is a |3- 
approximate clustering map of V for set C. The difference between a clustering map t(x) and the intuitive 
map arg minj-gj^.^ ,...,cic} d(x, c) is that t need not map each point x to its nearest center. 

Lemma 9 Let d be a non-negative symmetric function on X x X satisfying the L-approximate triangle 
inequality and let A, B <zXbe two disjoint sets of points such that 

(1) 0PT{A U B, k) < yOPT(B, k). 

(2) There exists ^-approximate clustering map to/AUB such thatMi E [k] : (Ci)nA| < (Ci)nB|. 

Then for any C C X we have 0PT[A U B U C, k) < (1 -|- A -|- (3yA)OPT(B U C, k). 

Proof : Let s be an optimal clustering map for B U C (i.e., s is 1-approximate). Then 

0PT(AU B U C,k) < L d(a,s(a)) -h Y_ d(x,s(x)). 

aSA xSBUC 

The second term is OPT(BuC, k), and we can bound the first term by connecting each element of (Ci)nA 
to a unique element in (Ci) H B, applying the A-approximate triangle inequality, and using the fact that 
(Ci) n A| < (Ci) n B| for all i E [k]. Details are provided in the Appendix. □ 

If we could ensure that the inequality conditions required by Lemma |9] hold, then we could apply the 
ideas from smooth histograms. The following two lemmas suggest a way to do this. 

Lemma 10 Let AU B be a set of n points received in an insertion-only stream, appearing in two phases 
so that all points in A arrive before all points in B, and assume that the algorithm is notified when the last 
point from A arrives. Using O(klog^n) space, it is possible to compute an 0[\)-approximate clustering 
map tfor A U B <35 well as the exact values o/{(|t~' (cO H A|, |t~^ (Ci) H B|)}|g[]^]. 
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Proof : Given a set of points P presented in a stream D, the PLS algorithm presented in |[T2l uses 
0(klog^ n) space to compute a weighted set S such that C0 ST(D, S) < cxOPT(D, k) for some constant a 
(El, Theorem 1). Using a theorem from ll2^ . it is shown in ifT^ that OPT(S,k] < 2(1 + a)OPT(D,k]. It 
follows immediately that running an offline ^.-approximation for k-median on S yields a set of k centers that 
constitutes an (a + 2£,(1 + a](-approximate k-median solution for the original stream D. 

The PLS algorithm uses as a subroutine the online facility location algorithm due to Meyerson |133]. 
Thus, each point in S can be viewed as a facility serving one or more points in P. Therefore, running the 
PLS algorithm on stream D yields a map r : P —> S such that r(p) G S is the facility that serves point p G P. 
Running a ^.-approximation on the set S to obtain centers {ci,..., C]^}, yields a map q : S —> {ci,..., C]^} 
such that point s G S is connected to q(s). 

Given disjoint multisets A and B, PLS yields maps Ta : A —> Sa and tb : B —> Sb- Running the 
^.-approximation on Sa U Sb, we obtain a map q : Sa U Sb —> {ci,... ,Ck}, from which we have a |3- 
approximate clustering map of A U B given by t(x( = q(rA(x]) if x G A and t(x] = q(rB(x]) if x G B, 
from which we can directly compute |t“' (ct) n A| = |q“^ (cO n SaI and similarly for |t“^ (ct) n B|. □ 

The previous lemma showed how we can check whether condition l^in Lemma|9]holds over two phases. 
We now extend this to the case where the stream has an arbitrary number of phases. 

Lemma 11 Let A] U • • • U Az be a set of n points in an insertion-only stream, arriving in phases 
Ai, A 2 ,..., Az, and assume that the algorithm is notified at the end of each phase A^. Using 0(Z^klog^ n] 
space, one can compute for every 1 < j < f < Z a ^-approximate clustering map t^,ifor Ai U • • • U Az and 
the exact values (Ci) n (Aj U • • • U Ae_i ]|, (Ci] n (Aj U • • • U Az)|}ig[k] for that map. 

Proof : This lemma is a natural extension of Lemma (TO] Details of the proof are in the Appendix. □ 

Lemma [TT] suggests one way to ensure that the conditions of Lemma|9]are met- simply treat every point 
as a phase- but this would require running 0(W^] instances of PLS, which would be infeasible. We would 
like to ensure that the conditions [T]and|2]in|9]hold, while running at most T <C W many instances of PLS. 
We can achieve this by starting a new phase only when one of these conditions would otherwise be violated. 
We will show in Lemma[T3]that this strategy incurs only polylogarithmically many phases. 

3.2 Algorithm for Sliding Windows 

Algorithm [Uproduces an approximate k-median solution on sliding windows. The remainder of the section 
will establish properties of this algorithm, culminating in the main result given in Theorem [T5l 

The bulk of the bookkeeping required by Algorithm [T] is performed by the UPDATE subroutine, defined 
in Algorithmic In the spirit of ifTTI . central to our approach is a set of indices {Xi, X 2 ,..., Xj}. Each Xi is 
the arrival time of a certain point from the stream. Algorithm [T] runs 0(T] instances of PLS on the stream, 
with the r-th instance running starting with point px^. Denote this instance by A(X0. The PLS algorithm on 
input P = {pi,piAi,...,pj} constructs a weighted set S and a score R such that 0PT(P, k) < C0ST(P, S) < 
R < aOPT(P, k). To check the smoothness conditions in Lemma |9] we will use the solutions built up by 
certain instances of PLS. We keep an array of O(T^) buckets, indexed as B(Xi, Xj) for 1 < i < j < T. In 
each bucket we store BjX,^, Xj) = (Sy, Rij) where Sy = SjX,^, Xj] and R^j = RjX,^, Xj) are, respectively, the 
weighted set and the cost estimate produced by an instance of PLS running on the substream {px^,..., pxj }■ 
Concretely, we run instance A.(Xi] on the stream starting with point pxj. At certain times, say time 
N, we will need to store the solution currently built up by this instance. By this we mean that we copy 
the weighted set and cost estimate as constructed by A.(X0 on points {pxi 5 --- 5 Pn} and store them in 
bucket B(Xi, N). We denote this by B(Xi^, N] <— store(A(Xi)). Instance A.(Xi^) continues running after this 
operation. We can view each B(Xi, Xj) as a snapshot of the PLS algorithm as run on points {pxu • • • jPXj}- 
As necessary, we terminate PLS instances and initialize new ones over the course of the algorithm. We 
assume an offline k-median 0(1 (-approximation algorithm Af, and denote by M.[V] the centers returned 
by running this algorithm on point set P. 
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Algorithm 1 Metric k-median in Sliding Windows 

Input: A stream of points D = {pi, p 2 ,..., Pn) from metric space (Af, d), window size W 

Update Process, upon the arrival of new point pn: 

1: for i = 1,2,... ,T do 

2 : B(Xi,N) <-store(.A(Xi]) 

3: Begin running ^(N) 

4: UPDATEO 

Output: Return the centers and cost estimate from bucket B (Xi, N) 


Algorithm 2 Update : prevents Algorithm [T] from maintaining too many buckets. 

1: If X 2 > N — W, then i <— 1. Otherwise, r <— 2. 

2 : while i < T do 

3: j the maximal ]' such that |3R(Xi, N) < yR(Xj/, N). If none exist, j i + 1 

4: C^Af(S(Xi,Xj)) 

5: while i < j do 

6: mark(Xi) 

7: i <— the maximal ^ such that (c) n (c) n 5^^| for all c G C. If none exist, i <— i + 1 

8: mark(Xj) 

9: i <— j “h 1 

10: For all unmarked X^, terminate instance ^(Xi) 

11: Delete all buckets B(Xi, Xj) for which either X^ or Xj is unmarked. 

12: Delete all umarked indices X^; relabel and unmark all remaining indices. 


Lemma 12 For any index m < N, let s be the maximal index such that [m, N] C [Xs,N]. Then 
OPr([m,N]) < (2+ (3y)OPr([Xs,N]). 

Proof : If Xj = m, there is nothing to prove, so assume Xj < m. This implies that index m was deleted at 
some previous time Q. The result follows by considering the state of the algorithm at this time. Algorithm[T] 
maintains the conditions required by Lemma|9]to ensure that for any suffix C, OPT([Xi, Q] U C) < (2 + 
(3y)OPT([m, Q] U C). Letting C = [Q + 1, N] yields the result. Details are given in the Appendix. □ 

In what follows, let OPT' = 0PT(W, k]/ p(W) and n = |W|, the size of the window. 

Lemma 13 Algorithm\I\maintains 0(klognlog OPT') buckets. 

Proof : Each iteration of the loop on Line |2] decreases R(Xi) by a factor of y/(3, so this loop is executed 
0(log.y/l3 opt') times. In each iteration of the loop on Line|5l the size of at least one of the k clusters 
decreases by half. Each set has size at most n, so this loop is executed 0(klog2 n) times. Each execution 
of each loop stores one bucket, so in total 0(klog nlog OPT') buckets are stored in these nested loops. □ 


Lemma 14 Assuming OPT' = poly(rL), Algorithm\I\requires O (poly(k,log n)) update time. 

Proof : The runtime of Algorithm [U is dominated by the O(Tk^log^n) time required to partition the 
buckets and that T = 0(klog^ n) by Lemma[T3] A more detailed proof is given in the Appendix. □ 

Theorem 15 Assuming OPT' = poly (n), there exists an 0[})-approximation for the metric ]c-niedian prob¬ 
lem on sliding windows using O (k^ log^ n) space and O (poly (k, log n)) update time. 
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Proof : Using Algorithm[T] we output a (3-approximation for [Xi, N], which includes the current window 
W. ByLemma[l2loPT([Xi,N],k) < (2+|3y]OPT(>V,k], andthus R(Xi,N) < |3(l+A+(3yA)OPT(>V,k). 
Let C be the approximate centers for [Xi, N]. We have the following inequalities: 

COST([Xi,N],C] < |3 opt([Xi,N]] 

OPT([Xi,N]) < (1 + A+|3yA)OPT(>V,k) 

C0ST(W, C) < COST([Xi,N],C], 

where the last equation follows from the fact that [Xi, N] contains the current window. Connecting these 
inequalities, we have C0ST(>V, C) < |3(1 + A + |3yA)OPT(>V, k), as desired. 

For the space bound, note that for each 1 < i < j < T, bucket B(Xi,Xj) contains the output of an 
instance of PLS. Each of these O(T^) instances requires 0(klog^n] space, and T = O(klog nlog OPT') 
by Lemma [T3l so our assumption that OPT' = poly(n) implies that we use 0(k^ log^ n) space in total. □ 

4 Euclidean Coresets on Sliding Windows 

In this section we first explain a coreset technique that unifies many of the known coreset techniques. Then 
we explain the merge-and-reduce method. Finally we develop our sliding window algorithm for coresets. 

A Unified Coreset Technique Algorithm. Many coreset technique algorithms for Euclidean spaces parti¬ 
tion the point set into small regions (sometimes called cells) and take a small number of points from each 
region of the partition either randomly or deterministically. For each region, each of the chosen points is 
assigned a weight, which is often the number of points in that region divided by the number of chosen points 
from that region. Some of the well-known coreset techniques that are in this class are (1) the coreset tech¬ 
nique due to Har-Peled and Mazumdar for the k-median and the k-means in low dimensional spaces Il29l ; 
(2) the coreset technique due to Chen for the k-median and the k-means problems in high-dimensional 
spaces 1131; (3) the coreset technique due to Feldman, Fiat and Shark for the j-subspace problem in low 
dimensional spaces Il2ll : (4) the coreset technique due to Feldman, Monemizadeh, Sohler and Woodruff 
for the j-subspace problem in high-dimensional spaces |[24l . We unify this class of coreset techniques in 
Algorithm |3] In the sequel, we will use this unified view to develop a sliding window streaming algorithm 
for this class of coreset techniques. We will give the proofs for k-median and the k-means in low-and high¬ 
dimensional spaces and we defer the proofs for the j-subspace problem in low-and high-dimensional spaces 
to the full version of this paper. 


Algorithm 3 Algorithm A: A unified coreset technique 

Input: A set P of n points, a constant c and two parameters 0 < e, 5 < 1. 

Algorithm: 

1: Suppose we have a (k, e)-coreset technique CC that returns a partition Ajc of 

2: Let dvc be the VC-D. of range space (P, 72) s.t. 72 is the set of shapes in similar to regions in Ay^. 
3: Suppose CC samples a set of size scc = f (tl, d, e, 6) from R where scc is a function of n, d, e, 6. 

4: For each region R G Aye we treat a weighted point p of weight Wp as Wp points at coordinates of p. 

5: Sample r = min (|R|, max (sec, 0(dvc£~^ log(Ti) • log( )))) points uniformly at random. 

6: Each such a sampled point receives a weight of riR/r where tlr is the number of points in region R. 

7: Let K, be the union of all (weighted) sampled points that are taken from regions in partition Ay^. 
Output: A coreset /C of P and its partition A^.. 


Lemma 16 Let V be a point set of size n in a d-dimensional Euclidean space and 0 < e <) be a 
parameter. Suppose we invoke one of the (k, e)-coreset techniques of H29\l or y i73l/ and let JC be the reported 







coreset and Ajq be the corresponding partition ofP. Suppose that for every region R E A/c containing tlr 
points from P, we delete or insert up to eriR points. Let JC' be the coreset reported by Algorithmic after 
these deletions or insertions. Then, JC' is a (k, e)-coreset ofJC. 

Proof : Proofs for |[29]| and ifTSl are in Sections [B.ll and IB.21 respectively. □ 

Merge and Reduce Operation. The merge and reduce method inspired by a complete binary tree is 
a generic method in computational geometry to implement non-streaming algorithms in the insertion- 
only streaming model. Let P be a set of n points 0, presented in a streaming fashion. The 
pseudocode of merge and reduce operation is given below. In this pseudocode, we use buckets 
{Bi, , B2, B^, • • • , Bi, B(, • • • , Biog(T^)_i, where buckets B^ and B{ can be considered 

as buckets in level i of the merge-and-reduce tree and are of size Xi, which will be determined for each 
concrete problem. All buckets B^, B( for i E [log n] are initialized to zero in the beginning of the stream. 


Algorithm 4 MergeReduce Operation 

Input: A stream S = [pr, Pr+i, Pt+ 2 ) • • • ? Pn -1 ] of length |S| = for a constant c and a point pN■ 

Update Process, upon the arrival of new point pn: 

1: Let i = 1 and add pN to Bt if Bi is not full, otherwise to B(. 

2: while Bi and B( are both full and i < log(n) do 

3: Compute coreset Z and partition Ai using Algorithm Zl(Bi U B(, Oi = e/{2 log n), S/n*^). 

4: Delete the points of buckets Bi and B(. 

5: Let Bi+i be Z if Bi+i is empty, otherwise let B{^^ be Z. 

6: Let r = r -|- 1. 

Output: Return coreset Sx = u|‘^|’^^Bi U B{ and partition As^ = Ab^ U Abj . 


The next lemma shows that the (k, e)-coreset maintained by Algorithm MergeReduce well- 
approximates the density of subsets of point set P within every region of partition AB^ for i E [logn]. 
The proof of this lemma is given in Appendix IB. 31 

Lemma 17 Let Bi be the bucket at level i of Algorithm MergeReduce with (k, e)-coreset Bi and partition 
Ab^. Suppose the original points in the subtree Bi is subset Pi C P. For every region Ri E Ab^ 

i-l 

IIPinRil-IBiHRill < Z ej( ^ (io,. nRD) , 

level ]—2 node xj in level j 

where Ox^ is a multiset of points at node Xj in level i of the merge-and-reduce tree such that for every point 
p E Oxj with weight Wp, we add Wp copies ofp to Oxj. 

Next we show the error of Lemma [TtI is small. 

Lemma 18 Let Bi be the bucket at level i of Algorithm MergeReduce with (k, e)-coreset Bi and partition 
Ab^. Suppose the original points in the subtree Bi is subset Pi C P. For every ] E [logn], if we replace ej 

bye/(2]), the error Y2ieii]=iY.node in level o/||Pi n Ri| - |Bi n Ri|| is e-fraction 

of the cost of Vi in terms ofV. arbitrary j-dimensional subspaces and so can be ignored. 

* Here we assume n is known in advance. The case where n is not known in advance can he accommodated using repeated 
guesses for n. See, for example 1291 . 
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Proof: Let us look at the sub-terms in the error term L„ode Xj in level j £) ( LrsAxj dOxj nR|)). For 

fixed node Xj in level j, ej (ej-fraction change in region R G Ax^ of Algorithm^ 

Using Lemma [161 for each one of the coreset techniques in 1291 and |[T3l . the new (k, e)-coreset after these 
changes in every region is again a (k, 2e)-coreset of point set Pt- Here we use this fact that a (k, e)-coreset 
of a (k, e]-coreset of P is a (k, 2e]-coreset of P. We have i levels, each one of which is a (k, e)-coreset of 
Pt- Thus, the error term is i times the error of one of one (k, e)-coreset of P^. If we replace Oj by e/(2j), the 
overall error would be the same the error of one (k, e)-coreset of P^, which can be ignored. In Algorithm 
MergeReduce we replace et = f®*" levels i G [logn]. □ 

Coreset Maintenance in Sliding Windows. In this section we develop Algorithm SWCoreset, a slid¬ 
ing window streaming algorithm for the class of coreset techniques (including the coreset techniques 
of ll29l . lITSl . ||2T1 . and Il24l l that fit into Algorithm [3] We show that the number of (k, e)-coresets that 
we maintain is upper bounded by the size of one (k, e)-coreset times 0(e^^ log n). 

Algorithm 5 SWCORESET 

Input: A stream S = [pi, p 2 , ps,..., Pn , • • •, Pn] of points 
Output: A coreset /Cx, for window W, i.e., points {pisi_w+i,..., Pn}- 
Update Process, upon the arrival of new point pn : 

1: for Xi G X = [xi, X 2 ,..., Xt] where xt G {1,..., Nj do 

2: Let (/Cxi) Ajc^. )=MergeReduce([xi, N] = {pxi, • • • )Pn-i}>Pn) bo the coreset and its partition. 

3: Let t = t -|- 1, Xt = N. 

4: Let (/Cxt) )=MergeReduce({}, Pn ) be the coreset and the partition of single point pN. 

5: for i = 1 to t — 2 do 

6: Find greatest j > i s.t. there is a region R in partition A/c^, whose at most ewR weight is in [xi, xj]. 

7: for i < r < j do 

8: Delete Xr, coreset and partition A/c^_^. Update the indices in sequence X accordingly. 

9: Let i be the smallest index such that pxj is expired and Pxig, is active. 

10: for r < I do 

11: Delete Xr and coreset /Cx^ and partition A]q^^, and update the indices in sequence X. 

Output Process: 

1: Return coreset /Cx, maintained by MergeReduce([xi , N] = {px,,... ,Pn-i})Pn)- 


Theorem 19 Let P C be a point set of size n. Suppose the optimal cost of clustering of point set P 
is OPTp = for some constant c. Let s be the size of a coreset (constructed using one of the coreset 
techniques A29I liJI/ ) that merge-and-reduce method maintains for P in the insertion-only streaming model. 
There exists a sliding window algorithm that maintains this coreset using 0(s^e“^ log n) space. 

Proof: According to Algorithm SWCORESET, the next index that we keep in sequence X occurs when e- 
fraction of a region R G Afz^, changes. Since s is the size of (k, e)-coreset that Algorithm MergeReduce 
maintains for P, the upper bound on the number of regions in partition Afz^, is also s. By LemmafT^ as long 
as at most e-fraction of the weight of a region in partition drops, we still have a (k, e)-coreset. Thus, 
after at most s/e indices, the optimal clustering cost drops by at least e-fraction of its cost. Therefore, after 
©(log^^gTi) = 0(e^^ logn) of this sequence of e-fraction drops in the optimal clustering cost, the cost 
converges to zero. Overall, the number of indices that we maintain is O(se^^logn). Moreover, for each 
index we maintain a (k, e)-coreset of size s using Algorithm MergeReduce; therefore, the space usage 
of our algorithm is O (e^^ log n). □ 
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A Missing Proofs and Further Results from Metric k-Median Clustering in Sliding Win¬ 
dows 

A.l k-median and k-means are not smooth functions 

Claim 20 k-median and k-means clustering are not smooth functions. That is, there exist sets of points 
A, B and C such that for any y, (3 > 0, 0PT[A U B, k) < yOPr(B, k) but 0PT[A U B U C) > (30Pr(B U C). 

Proof : Let A U B consist of k distinct points, with A / 0, B / 0 and A contains one or more points not 
in B. Then B contains at most k — 1 distinct points, and 0PT(A U B, k) = 0 < yOPT(B, k) = 0 for any y. 
Consider a set C consisting of a single point and satisfying C n (A U B] =0. Then A U B U C has at least 
k + 1 distinct points, so that 0PT(A U B U C, k) >0, while 0PT(B U C, k) = 0. Then for any |3, we have 
that opt(A U B U C, k) > 0 = (3 opt(B U C, k], □ 

A.2 Proof of Lemma |9] 

Proof of Lemma IS Let t be the |3-approximate clustering map of AU B in the hypothesis. By assumption, 
t induces partitions Ai, A 2 ,..., Ai^ and Bi, B 2 ,..., B^ of A and B, respectively, given by A^ = (Ci) n A 
and Bi^ = (cO n B. Since |Ai| < |Bi| for all i E [k] by assumption, for each i E [k] there exists a 
one-to-one mapping from A^ to a subset of B^. Letting a E A^, we have 

gi(a]) < d(a,t(a)) -P d(gi(a), t(a]) = d(a,t(a)] -P d(gi(a),t(gi(a])], 

where the first inequality is the approximate triangle inequality and the second inequality follows from the 
fact that t(gi(a)) = = t(a] by definition of gt. Thus, 

k k 

Y. Y- - YLY- + d(gi(a],t(gi(a)])] 

i=l aSAi i=l asAi 

< d(x,t(x)) < |3 opt(AU B,k) < |3yOPT(B,k], (1) 

xSAUB 

where the first inequality follows from the approximate triangle inequality, the second inequality follows 
from the definition of gt, the third inequality follows since t is |3-approximate, and the fourth inequality 
holds by assumption. Let s be an optimal clustering map for B U C (i.e., s is 1 -approximate). Then we have 

k k 

LL d(gi(a),s(gi(a))) < LL d(b, s(b)) < 0PT(B U C,k). (2) 

1=1 asAt 1=1 bgBi 

Bounding the cost of connecting A to the optimal centers of B U C, we obtain 
k k 

y d(a,s(a)) = '^y_ d(a,s(a)) <y Y d(a,s(gi(a])) 

qSA i=l aSAi i=l aSAi 

k 

sLL A[d(a, gi(a)) -P d(gi(a), s(gi(a))]] < Aopt(B U C,k) -P |3yA0PT(B,k), 

i=l asAt 

where the first inequality follows from the fact that s (a) is the closest center to a by definition, the second in¬ 
equality follows from the approximate triangle inequality, and the third inequality follows from equations (Ull 
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and (IH). Thus we conclude that 

0PT(AU B U C,k) < ^ d(a,s(a)] + ^ d(x,s(x)] 

aSA xSBUC 

< (1 + A)opt(B U C, k) + |3yA0PT(B, k) < (1 + A + PyA)OPT(B U C, k). 

□ 

A.3 Proof of Lemma ITT] 

Proof of Lemma im Using our algorithm from Lemma [TOl we proceed as before until we are notified of 
the first point of A 2 . Here, we store 5^2 PLS(A 2 ), but we continue running this instance of the PLS 
algorithm. As before, we commence a new instance of the PLS algorithm beginning with the first point of 
A 2 . In general, whenever a transition occurs to Aj, for all i < j we store 51,i <— PLS(Ai U • • • U Aj_i ] and 
continue running all instances. As a result, we maintain sets for i E [Z] and j > i. There are O(Z^) 
such sets, each of size 0(klog^n]. When we wish to compute the (3-approximate map, we run an offline 
0(1 )-approximation on U Si^z- The cluster sizes are computed as in Lemma [TOl □ 

A.4 Proof of Lemma [I 2 I 

Proof of Lemma [E] : If Xs = m, we have equality, so suppose Xj < m, implying that index m was 
previously deleted at some time Q, when pq the most recent point to arrive. Let z be the index (assigned 
before deletion) such that X^ = m. During some iteration of the loop on Lines |2]l9j it must have held after 
Line[3]that Xt < Xj < X^ < Xj. This is because both X^ and Xj are stored, so s > i by maximality of s. 

Line|3]guarantees |3R([Xi, Q]] < yR([Xj, Q]). Since the (3-approximation ensures that OPT(-) < R(-) < 
|3 opt(-], this implies that OPT([Xi, Q]) < yOPT([Xz;, Q]). Let t denote the (uncalculated, but existent) (3- 
approximate map of [Xy Q] as in Lemma [TOl The loop on Line |5] ensures that |t“^ (Cw] H (Cw) H 

SijI for every w E [k]. Therefore, Lemma |9] guarantees that for any suffix C, OPT([Xi, Q] U C] < (2 + 
|3y)0PT([Ta, Q] U C). By letting C = [Q + 1, N], the result is obtained. □ 

A.5 Proof of Lemma [l4l 

Proof of Lemma [TH Let h.(-) be the update time for PLS and g(-) be the time for the offline c- 
approximation. Feeding new point pN to all T instances of PLS requires TK(n) time and computing the c- 
approximation for all O (log OPT) = O (log n) iterations of the loop on Line|2]requires O (log n • g (k log^ n)) 
time. Partitioning each bucket requires 0(Tk^log^n] time, and finding the maximal index on Line (Tire- 
quires O (T^k) time. In total, an update takes O (h(n) • T -|- log n • g (k log^ n] -p Tk^ log^ n + T^k] time. By 
Lemma [T3l T = 0(klog^ n). By |[T2l . the update time of PLS is polynomial in its argument, and using any 
of a number of offline 0(1 (-approximations for k-median, for example, |l4l|32l- Moreover, g(-) and h(-) are 
such that the last two terms are the largest factors, resulting in an update time of 0(k^ log"* n]. □ 

A.6 Algorithm for Metric k-MEANS 

The metric k-means problem is defined similarly. 

Definition 21 (Metric k-means) Let P C X be a set of n points in a metric space (X, d) and /ef k E N 
be a natural number. Suppose C = {ci,..., Cid is a set o/k centers. The clustering of point set P using 
C is the partitioning of P such that a point p E P iA in cluster Ci if Ci E C is the nearest center to p 
in C, that is point p is assigned to its nearest center Ci E C. The cost of Ic-median clustering by C is 
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cosr^(P, C) = ^pgp d^(p,C). The metric \i-median problem is to find a set C C P o/k centers that 
minimizes the cost COS7^(P, C), that is 


C0ST^[P,C] = Y ci^(p,C)= min COS7^(P, C') 

^ C'CP:|C'hk 

peP 

= min y”d^(p,C') , 

C'CP:|C'|=k'^ 

peP 


where d^(p, C] = mincgc d^(p, c) and d^(p, C') = minj-gc' d^(p, c] 


A concept used by our algorithm for the metric k-means is the notion of a-separability ifTOl . Intuitively, 
data which is separable for a high value of a is well-clusterable into k-clusters (i.e. removing one center 
greatly increases the optimal cost). 


Definition 22 (o-separable dataset) UO\l A set of input data is said to be a-separable if the ratio of the 
optimal 'k-means cost to the optimal (k — 1 ]-means cost is at most o"^. 


We now turn to the modifications necessary for k-MEANS. We state the main theorem and explain the 
necessary changes in the remainder of this section. 

Theorem 23 Assuming OPT' = polp (n) and every window is a-separable for some a = 0(1), there exists 
a sliding windows algorithm which maintains a 0[^)-approximation for the metric k-MEDIAN problem 
using 0(k^ log^ n) space and 0(k^ log"^ n) update time. 

In Lemma [TOl for k-MEDlAN we had used the PLS algorithm to maintain a weighted set S such that 
cost(P,5) < (xOPT(P, k) for some constant oc. Instead, we now use the insertion-only k-MEANS algorithm 
of [(Toll . This algorithm also works by providing a weighted set S such that C0ST(P,5) < cxOPT(P,k) for 
some constant a. Here, the space required is again O(klog^Ti). The approximation-factor a is now O(a^) 
where the data is o-separable as defined in Definition |22] The second modification is in Algorithm [T] For 
k-MEDlAN, we had used Lemma |9] with A = 1 since the k-MEDlAN function satisified the 1-approximate 
triangle inequality (i.e. the standard triangle inequality). For k-MEANS, we now satisfy the 2-approximate 
triangle inequality, so we use the lemma with A = 2. Theorem [15] still holds without modification, so we 
result in a 0 (ct"^)- approximation. 

B Missing Proofs and Further Results from Euclidean Coresets in Sliding Windows 

Here, we briefly review fhe definilion of VC-dimension and e-sample as presenfed in Alon and Spencer’s 
book El. A range space S is a pair (X, R), where X is a sef and R is a family of subsefs of X. The elemenfs 
of X are called points and fhe subsefs in R are called ranges. For A C X, we define fhe projection of 
R on A as Pr(A) = {r n A : r G R}. We say A is shattered if Pr(A) confains all subsefs of A. The 
Vapnik-Chervonenkis dimension (VC-D) of S, which we denofe by dvc(S), is fhe maximum cardinalify of 
a shaffered subsef of X. If fhere exisf arbifrarily large shattered subsefs of X, fhen dvc(S) = oo. Lef (X, R) 
be a range space wifh VC-D d, A C X wifh |A| finife, and lef 0 < e < 1 be a paramefer. We say fhaf B C A 
is an e-sample for A if for any range r G R, 

Lemma 24 (131) Let S = (X, R] be a range space of VC-D dvc(S), A C X with |A| finite, and let 0 < 
e < ] be a parameter. Let c > ^ be a constant and 0 < 6 < 1. Then a random subset B of A of size 
s = min(|A|, ■ (dvc(S) log(dvc(S)/e) -|-log(l/6)]) is an e-sample for A with probability at least 1 —6. 
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Lemma 25 ([271 (Chapter 5)) Let S = (X, R) and T = (X, R') be two range spaces of VC-D dvc(S) and 
dyc(T), respectively, where dvc(S), ^ U = {r U r'|r G R,t' G R'} and I = {r U r'|r G 

R, r' G R'}. Then the range spaces S = (X, U) and S' = (X, I) have dvc(S] = dvc(S') = 0((dvc(S] + 
dyc(T)] log(dvc(S) + dyQ(T)]). In general, unions, intersections and any finite sequence of combining 
ranges spaces with finite VC-D results in a range space with a finite VC-D. 

In the VC-D of a half space is d ETl . Balls, ellipsoids and cubes in R'^ have VC-D O(d^) as can be 
easily verified by lifting the points into O(d^) dimensions, where each one of these shapes in the original 
space is mapped into a half space. 

Next, we first review 2 well-known coreset techniques that fit into the framework of Section |4l These 
coreset techniques are 

(1) Coreset technique of |[29]| due to Har-Peled and Mazumdar for the k-median and the k-means in low 
dimensional Euclidean spaces, i.e., when dimension d is constant. 

(2) Coreset technique of l(T3ll due to Chen for the k-median and the k-means problems in high dimensional 
Euclidean spaces, i.e., when dimension d is not constant. 


Next, we prove EemmafT^for each one of these coreset techniques. Interestingly, the coreset technique 
of |[29l is the basis of almost all follow-up coresets for the k-median and the k-means in low dimensional 
Euclidean spaces. This includes the coreset techniques of |[^|28]| . The same is true for the coreset tech¬ 
nique of ifTSll which is the basis of almost all follow-up coresets for the k-median and the k-means in high 
dimensional Euclidean spaces. This includes the coreset techniques of 12311^1^122]) . 

Einally we prove EemmafTTl 

B.l Coreset Technique of l[29ll due to Har-Peled and Mazumdar 


We explain their coreset technique for the k-median problem. The same coreset technique works for the 
k-means problem. We first invoke a a-approximation algorithm on a point set P G R‘^ that returns a set 
C = {ci, • • • , Cic} C R‘^ of k centers. Eet Ct be the set of points that are in the cluster of center c^, i.e. 
Ci = {p G P : d(p, Ci) < mincjgc d.(p, Cj)}. We consider logn -|- 1 balls BalUj (ct, 2> • 22§I(fi£l] centered 

at center Ct of radii 2^ •- ^ for j G {0,1,2, • • • , log n}. In BalUj, we impose a grid Gij of side length 

• 2> •- ' and for every non-empty cell c in grid Gij we replace all points in c with one point 

of weight TLc, where tLc is the number of points in c. Eet /Cp be the set of cells returned by this coreset 
technique, that is /Cp = c. We also let partition Ap to be the set of cells Ap = {Gy n Ballt^j : i G 

[k],j G [logn-f 1]}. 

In Step 3 of Algorithm [3] we let scc = f (tl, d, e, 6) = 1 (which is the number of points that this coreset 
technique samples from each cell) be 1. Moreover, we let parameter s in Eemma [T^ (which is the size of 
this coreset maintained by merge-and-reduce approach) be log^^^^^ n). We now prove a variant of 

Eemma[T^for this coreset. Observe that to adjust the parameter e in Eemma [T^ for Eemma 1^ we replace 


Lemma 26 Let P C R‘^ be a point set of size n and k G be a parameter. Let Ap = (ci, C2, • • • , Cx} 
be the partition set of cells returned by the coreset technique of Har-Peled and Mazumdar / |29I/ . Let B = 
(hi , • • • , b]^} C R'^ be an arbitrary set ofV centers. Suppose for every cell c £ Ap we delete up to • ric 
points and let d be cell c after deletion of these points. V/e then have 


y~ \cosT[d,B) - cost[c,B)\ < e ■ cost[P,B) , 

cG Ap 

where COST[c,B] = Upgc^fP)'^) COST[d,B) = ^pg^/ d(p,.S). 
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Proof : Let us fix a cell c G Ap. Suppose c G Gi^j for a particular cluster and j G {0, 1 ,2, • • • , log n}. 
Assume the nearest center to cell c is be G B. We either have d(c, be) > |ord(c,be) < If we have 

(i(c, be) > then COST(c, .S) > ric • d(c, be) > rie • On the other, since we delete up to • Tie points 
of cell c, the cost that we lose is at most 


^ •Tle(\/dfe + Ci(c,be)) < 


• Tie • (v/de + 1) • d(c, be) < e • COST(c, B] . 

Vd 


Now suppose we have d(c,be) < T- We have two cases, either j = 0 or j > 0. We first prove the 
lemma when j = 0. Since cell c is in Gt^o> we have fe = • 22§I(Z£i Therefore, d(c,be) < ^ — 

• tiOST(P,C) ^ every cell in Ballt^o G Gi^o we delete up to • ric points. Therefore, the cost that 


we lose is at most 


L e 

7d 


cGBciII{ 0 


nc • —• COST(P, C) < • C0ST(P,,8) . 

lOvda TT- lOda lOd 


The second case is when j > 0. Recall that d(c,bc) < Observe that since j > 0, d(c, Ct) > 


2j-1. COST(P,C) ^ __e .2j. COST(P,C) ^ 

^0\^da 

2 

For each such cell c we delete up to • Tie points. Thus, taking the summation over all i and j > 0 yields 


^^•d(c,Ci) which means d(c,bc) < ^ < ^^•d(c,Ci). 


t=l j>0 ceGvjnBallij 


e 

71 


Tic • d(c,bc) <^Y- Y- 


TL, 


i=l j>0 cSGijnBaUi^j 


Vd bVd^ 


(X 


d(c,Ci) 


2 2 

< ^ • COST(P, C) < |- • COST(P, B) . 
5da 5d 


□ 


B.2 Coreset Technique of |[13ll due to Chen 

We explain his coreset technique for the k-median problem. The same coreset technique works for the 
k-means problem. We first invoke a a-approximation algorithm on a point set P G that returns a set 
C = {ci, • • • , C]^} C of k centers. Let Ci be the set of points that are in the cluster of center c^, i.e. 
Cl, = {p G P : d(p, Ci) < minc.gc d(p, Cj)}. We consider logn + 1 balls Bally (ci,2’ • C2§Iif>£i] centered 
at center Ct of radii 2> • for j g {0,1,2, • • • , log n}. In ring Ballij\Ballij_i having ny points, we 

take a sample set Sij of size s = min(nij, e~^dklog( *^^°|”^ ) points uniformly at random and we assign a 

weight of rii j/s to each sampled point. We replace all points in Ballt^j with weighted set St^j. Let /Cp be 
the union set of weighted sampled sets returned by this coreset technique, that is /Cp = UijSi^j. We also let 
partition Ap to be the set of rings Ap = {Ballij\Ballij_i : i G [k], j G [logn +1]}. 

In Step 3 of Algorithm [3] for each ring Balli j\Ballij_i having nt^j points we let sec = f(TT-, d, e, 6) be 
min(ni^j, e~^dklog( ’^*°|^ ). Moreover, we let parameter s in Lemma [19] (which is the size of this coreset 
maintained by merge-and-reduce approach) be 0(k^de^^ log^ n). We now prove a variant of Lemma [T^ 
for this coreset. Observe that to adjust the parameter e in Lemma [T^ for Lemma[2^ we replace e by 

Lemma 27 Let P C be a point set of size n and k G N a parameter. Let Ap = {Balf j : i G 

[k],j G [logn + 1]} be the partition set of balls returned by the coreset technique of Chen f liil/ . Let 
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B = {bi, • • • ,bic} C be an arbitrary set of\i centers. Suppose for every ring Ball^\Ballx^^--\ E Ap 

2 

having points, we delete up to • riij points and let [Ball^\Balli^j--\]' be this ball after deletion of 
these points. We then have 

|COSr((BaZZg\BaZZy_i)',i3) - COSr(BaZZy\BaZZi,j_i, ^)| < e • C 0 ST[P,B) , 

BaUi^j\BaUi^j—^ GAp 

where C0Sr(5a//ij\5a//tj_i, B) = d(p, B) and COST[[BalIi^j\BaUi^j-]) \ B) = 

^-p£[Balli^j \Balli^j_-\ ]' d{p,B]. 

Proof : The proof is in the same spirit of the proof of Lemma [2^ Let us fix a ring Balli,j\Ballt^j_i E Ap 
for i E [k] and j E {0,1, • • • Jogri}. Observe that the radius of Ballij is 2’ • radius 


of Ballij_i is 2^ ^ • 20§I(b£2 p^j. simplicity let us denote Bally \Ballij_i by Rtj and we let £r^ . = 

2> ■ -and let tlr^ . be the number of points in the ring . Assume the nearest center to ring R^ j 

is by . E B. We either have d(Ry,bR. .) > ^ or d(Ry,bR^j) < If we have d(Ry,bRjj) > 

then COST(Ri j, ;8) > ur. . • d(Rij, bR. . ] > tir. . • On the other, since we delete up to ^ • tir^ . points 
of cell Rij, the cost that we lose is at most 

2 2 

y • riR. .(2 £r._. +d(Ry,bR. .)] < y • tlr. . • (2e + l) • d(Ry,bR._.) < e-C0ST(Ry,5) . 

{r- ■ 

Now suppose we have d(Ry, bR^.) < —We have two cases, either j = 0 or j > 0. We first prove 

the lemma when j = 0. For j = 0, ring R^ o is in fact ball Ballt,o of radius £r^ ^ ’ L Since cell c is 

in Gi,o, we have £c = Therefore, d(Ri,o, bR^ J < j . COST(P,C) ^ delete up to 

2 

^ • Ur. q points from Rt o- Therefore, the cost that we lose is at most 


e" 1 C0ST(P, C) . e 


y • nR._„ • d(Rt,o,bR,_J < — ■ tir.^^ • - 


^ 2 ■ 


C0ST(P, C) 


n z ’ TL 

We have i E [k]. Hence a summation over i will find the overall cost that we lose as follows. 


V- e C0ST(P, C) e , 

Y. 1 ■ '^Ri.o-1- ^ T • COST(P, C) . 


ie[k] 


n 


N, 


The second case is when j > 0. Recall that d(Rij, Tr^ .) < —Observe that since j > 0, d(Rtj, Ct) > 
2,-1 ■ C0ST(P,C) ^ = 21 • ‘^QST(p,c) < 2d(Ry,Ci) which means d(Ry,bR._.) < ^ < |•d(Ry,Ct). 

For each such ring Rt^j we delete up to ^ • tir^ ^ points. Thus, taking the summation over all i and j > 0 
yields 


Ic 2 ^ 2 2 

YY Y y• d(Ry,by_^) < Y y 

i—1 j>0 ^i,j i—1 j>0 ^tjCAp 

< e • C0ST(P, C) . 


□ 
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B.3 Proof of Lemma [It] 


Proof of Lemma [13 Recall that we take (k, e]-coreset from its children which are buckets Bt_i and 
B( Observe that Bi_i and B{ ^ are also (k, e)-coresets of subtrees rooted at nodes Bi_i and B|'_^ with 
partitions AB^_, and Ag/ respectively. Similarly, we take (k, e)-coreset Bi^i from its children which are 
buckets B|^2 and B{_2 that are, in turn, (k, e)-coresets of subtrees rooted at nodes Bi_2 and B{_2- We let 
Oi_i = Bi_2 U B(_2- Lst us fix arbitrary regions G AB^_, and Rt G Ag; such that R|_i n R^ 7^ 0. 

Recall that we take a sample set of size rt_i = min (|Oi_i 0 Ri^i |,max (scC) 0 (dvcei^i log('n] • 
jQg(^ dv^jo^td uniformly at random from Ot_i n Ri_i, assign a weight of (|Oi_i H Ri_i|)/ri_i 

to every sampled point, and we add the weighted sampled points to Bi_i. Here, scc is from Algorithm 
Unified. This essentially means, B|_i n Ri_i is an ei_i -sample for Oi_i n Ri_i. Since we treat a weighted 
point p having weight Wp as Wp points at coordinates of p, we have |B|_i n Ri_i | = |Oi_i n Rt_i |. Thus, 

||(Oi_i nRi_i)nRi|-KBt_i nRi_i]nRi|| < et_i • |Oi_i nRi_i| . 

Observe that for regions Ri and Ri_i, using Lemma [25] (P, Ri n Ri_i) is a range space of dimension 
0 ( 2 dvc log( 2 dvc))- So, we can write 

||Oi_i n (Ri_i n Rt)| - |Bi_i n (Ri_i n Ri)|| < ei_i • |Oi_i n Ri_i| . 

Now let us expand Oi_i n (Ri_i n Ri) where we use Oi_i = Bi^2 U B{_2- Since Bi_2 and Bf 2 are 
disjoint, we then have (Bi_2 U B{_2) O (Pt-i O Pi) = (Pi-2 O (Ri_i n R^)) U (Bf_2 O (Pi-i O Pi))- 

Similarly, let us consider (k, e)-coreset Bi_2 with its partition AB^_2 which is a (k, e)-coreset of its 
children, i.e., buckets Bi_3 and Bf_3. Again, let Oi^2 = Pi-3 U P(_3- Let us fix an arbitrary region 
Pi-2 £ ^^^h that Ri^2 n Ri_i n Ri / 0 . Again, we take a sample set of size ri_2 = min (|Oi_2 H 

Ri^ 2 l) max [scci 0 [dvc^^l 2 lo§(^) -logj )))) points uniformly at random from 03^20 Ri^ 2 > assign 

a weight of (|Oi^2 H Ri-2l)/T'i^2 to every sampled point and we add the weighted sampled points to Pi^2- 
Since Bi^2 H Ri^2 is an ei_2-sample for 0i_2 H Ri_2 and |Pi-2 H Ri^2l = |Oi^2 H Ri^2L we then have 


||(Oi^2nRi^2)n(Rt_i nRi)| 


|(B 3,2 n Ri,2) n (Ri^2 n Ri)|| < ei^2 • (|Oi^2 n Ri^2l) ■ 


Observe that for regions R^, Ri_i and R^^ 2 ^ using Lemma [25l (P, Ri n Ri_i n Ri^2) is a range space of 
dimension 0(3dvc log(3dvc))- So, we can write 


||Oi^2 n (Ri^2 n Ri_i n Ri)| - |Bi^2 n (Ri^2 n Ri^2 n Ri)|| < 01^2 • (IOi^2 n Ri^2l) • 

We do the same for P(_ 2 - We define Of _2 for B.' ^ similar to 0i^2- Using triangle inequality we have 

I Y. I 0 i^ 2 n (Ri,2nRi_i nROI+ Y 10^2 n (Ri'^2 n Ri-i n Ri)| - |Bt_i n (R^^i n R 0 I| 

Ri_2SABj_2 Rl-2SAg/ ^ 

< £ 3^2 • ( Y (|0t-2 n Rt_2l) + Y n R(^2l)) + et-l • |03_1 O Rt^i | . 

Ri-2SAb^_ 2 Rl-2SAg/ ^ 

Now we recurse from level i — 2 down to level 2 in which we have (k, e)-coreset B 2 with partition 
Ab 2 which is a (k, e)-coreset of its children, i.e., buckets Bi and B{. Again, let O 2 = Bi U B|. Let 
us fix an arbitrary region R 2 G Ab 2 such that R 2 n • • • n Ri ^ 0. Once again, we take a sample set of 
size r 2 = min (IO 2 H R 2 I, max (scc, OjdycoJ^ log(Fi) • log( points uniformly at random from 
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Oa n Ri, assign a weight of to every sampled point, and we add the weighted sampled points to B 2 . 

Since B 2 n R 2 is an e 2 -sample for O 2 H R 2 and IB 2 n R 2 I = IO 2 n R 2 I, we then have 

IKO 2 n R 2 ) n (R 3 n • • • n Ri)| - KB 2 n R 2 ) n (R 3 n • • • n Ri)|| < 02 • (IO 2 n R 2 I) . 

Observe that for regions R^, Ri,_i, • • • R 2 using Lemma|25l (P, Ri n • • • n R 2 ) is a range space of dimension 
0((i — 1 )dvc log((i- — 1 )dvc))- So, we can write 

IIO 2 n (R 2 n • • • n Ri)| - IB 2 n (R 2 n • • • n Ri)|| < 02 • (IO 2 n R 2 I) . 

By repeated applications of the triangle inequality for levels i — 2 down to 2 we obtain 
I |Ox 2 n (Ri_i nRi)|-|Bi_i n (Ri_i nRi)|| 

node X2 in level 2 

= \ Y. Y Y ■■■ Y IOx,n(R^^nR5^n---nR;r/)nRt_i nRt|-|Bt_i n(R3_i nROI 

X2 in level 2 r^ 2 eAx^_2 

i -2 

< L L =l•{L (|OxjnR|))+er_i -lOi^inRt^il , 

level j—2 node Xj in level j 

where X2 is a child of node X3, X3 is a child of node X4, and so on, and Oxj is the point set at node Xj. Observe 
that in level 1 (i.e., leaf level) we do not merge buckets and merging buckets starts at level 2 , because of that 
the index of the first sum start with j = 2. We take sums ^ g^^ and ^Ir' gA / conclude 

1-1 1-1 -I 

i -1 

||PinRt|-|BinRtl| < Y Y ej( XI (lOxj nRD) . 

level j —2 node xj in level j RgAx- 

In Algorithm [ 3 ] we simply replace VC-dimension dye by 0(dvc logn) for all levels i € log n. □ 
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